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ABSTRACT 

There are at least two situations in which the 
behavioral scientist wishes to transform uniformly distributed data 
into normally distributed data: (1) In studies of sampling 

distributions where uniformly distributed pseudo- random numbers are 
ad by a computer but normally distributed numbers are desired 
and (2) In measurement applications where standardization of an 
instrument requires that percentile ranks be transformed into 
normally listributed standard scores. The problem investigaged in 
this study is find z when given P(z). The difficulty is that 
expressions which approximate the integral from minus infinity to z 
are not readily solvable for z. A number of investigators have 
derived algebraic approximations to the inverse Gaussian. The most 
widely used algebraic approximations of the inverse Gaussian function 
are those derived by Hastings. The Hastings approximations are valid 
only for values of P(z) greater than 0.50, and a computer program 
must make logical provisions for the situation where P(z)0.50. Burr 
approached the problem through the use of a cumulative moment theory 
and also derived two approximations. 3urr*s approximations have the 
advantage that they are valid for all values of P0. They are also 
conveniently expressed in one FORTRAN statement. It was the objective 
of Byars an . Roscoe to develop an approximation of the inverse 
Gaussian which was both more accurate and more efficient than 
previous transformations. A final expression was obtained from the 
solution cr approximately 4300 equations in six unknowns. The three 
sets of approximations were compared on accuracy and computational 
efficiency, and the Byars-Roscoe approximation was found to be 
superior to the others. (CK) 
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RATIONAL APPROXIMATIONS OF THE INVERSE GAUSSIAN FUNCTION 1 
Jackson A. Byars and John T. Roscoe, Kansas State University 



BACKGROUND 



There are at least two 



situations 



in which the behavioral scientist wishes 



to transform uniformly distributed data into normally distributed data: 

(1) In studies of sampling distributions where uniformally distributed 
pseudo-random numbers are g i-i\\tc.l Ly a computer but normally 
distributed numbers are desired. Such applications frequently 
involve Monte-Carlo studies in which very large samples of data 
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are drawn. In such cases computational efficiency becomes a prime 
criterion for normalization functions. 

(2) In measurement applications where standardization of an instrument 

requires that percentile ranks be transformed into normally distributed 
standard scores. In such applications accuracy as well as computational 
efficiency is of importance. 

In either situation it would be desirable to have an efficient, accurate 
procedure for use in computer programs. 

The standard normal cumulative distribution function is given by: 

z 



p(z) 




dt 




\ 






The problem investigated in this study is to find z given P(z). The 
difficulty is that exprecsions which approximate the integral from minus infinity 
to z are not readily solvable for z. It is possible to obtain a value for z 
which is as accurate as may be desired by means of making successive approximations 
of the value of the integral for various values of z. Such procedures are 
computationally inefficient and are not considered in this paper. 
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A number of investigators have derived algebraic approximations to the 
inverse Gaussian. These approximations are of varying degrees of accuracy and 
computational efficiency. In this paper an algebraic expression is presented 
which is more accurate than previous expressions over the range 0.01 iP(z)< 0.99 
and is more computationally efficient. 

HASTINGS 1 APPROXIMATIONS 

The most widely used algebraic approximations of the inverse Gaussian 
function are those derived by Hastings. (1) . He used Chebyshev polynomials to 
derive two approximations, the first simpler than the second but yielding a less 
accurate approximation. The Hastings approximations are valid only for values 
of P(z) greater than 0.50, and a computer program must make logical provisions 
for the situation where P(z)^0.50. Since the approximations are found on sheets 
67 and 68 respectively of Hastings* book, they are referred to in this paper as 
Hastings (67) and Hastings(68) respectively. / 



Hastings(67) 

for P(z) £ 0.50 



In (1 - P(z) 2 



V 



z « n -.'l + bjO + bjh 

a * 2.30753 
o 

* 0.27061 



2 ? 



b x * 0.99229 
b 2 * 0.04481 



Hastings (68) with P(z) and n as above 

2 * 

. a 0 + a^n + a 2 n 




a Q « 2.515517 
&1 » 0.802853 
a 2 » 0.010328 



b x » 1.432788 
b 2 * 0.189269 
b 3 « 0.001308 
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For purposes of this investigation, the above formalae were written in 
FORTRAN IV as follows: 



IF(P.EQ. 0.500) P * 0,50000001 
C ABS(P-. 5) 

T = SQRT (AL0G(i. / (,5-C)**2)) 

D = (P— . 5) /c 

H67 * D* (T-((2. 30753+0. 27061*T) /(1.-K0. 99229+0. 04481*T)*T))) 



H68 * d*(T-((2.515517+{.802853+.010328*T)*T)/(1.+(1« 432788+ 

&( . 189269+. 001308*T) *T) *T) ) ) 

If either o r : the two approximations were calculated alone, the first four 



lines would still be required although it would be possible to incorporate the 



calculation of D, which only provides the sign, into the statement in which the 



approximation is calcualted. 



DURR’S APPROXIMATIO NS 

approached tlic problem through the use of cumulative moment theory 
and also derived two approximations. (2) Burr’s approximations have the advantage 
-ha-, thay are valid for all values of P}0. They also are conviently expressed 
iu one FORTRAN statement. Burr’s less accurate approximation is found in formula 
6 and the mom accurate in formula 7 and are hereafter referred to as Burr (6) 



and Burr (7). 

Bvrr(6);- -1/0.150 n 1/4.874 

I/X-P) - 1\ -.644693 

* ' > 

Z ~ -- - - _ 

.lcrs; 

3urr(7) r -1/6.150 • 1/4.874 (--1/6.158 

i(i-p> - 1 » 

z - L. - L 



.323968 



-1/4.874 

1 



These expressions appear even simpler when written in FORTRAN statmentss 
A 88 —1. /6. 158 
3 « 1./4.874 

26 * ( ( ( 1 . -P) * ;; A-1 . ) **B - 0.644693) /0. 161984 
B7 - ( ( ( 1, -?) **A-1 . ) **B - (P**A-1. ) **B) /0 . 323968 
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BYARS AND ROS COE’S APPROXIMATION 



It was the objective of these investigators to develop an approximation 

of the inverse Gaussian which was both more accurate and more efficient than 

previous transformations. In the early scages of the investigation both polynomial 

expressions and rational polynomial expressions were considered. The rational 

expressions showed more promise and in the latter stages of the investigation 

only such expressions were considered. In the early stages of the investigation 

a number of variable transformations were considered. It was noted that the 

transformation R = P - .5 resulted in the terms of even degree vanishing from 

the numerator of the rational expression and in terms of odd degree vanishing 

from tha denominator. This made it possible to obtain an expression containing 

high powers of R, but with only half the number of terms in numerator and 

denominator that might be expected. 

# 

The coefficients of the rational expression were found by using a least 
squares approach with successive trials having greater weightings on points 
at which the approximation was least accurate. The final expression was obtained 
from the solution of approximately 4300 equations in six unknowns. At that time 
approximately 3400 of the points considered were between .01 and .03 or between 
.97 and .99. 

The approximation derived is as follows : 
for R * P - .5 and 0 £ P Cl 




1 4 b 0 R 2 4 b.R 4 4 b.R 6 
2 4 6 



where 



a x » 2.505922 

a $ - -15.73223 

a- * 23.54337 
5 



b 2 « -7.337743 

b, * 14.97266 
4 

b, « -6.016088 
6 



This expression is written with the following FORTRAN statements: 

B « P - 0.5000000 

Q * R*R 

BR * ( (2. 505922+C -15. 73223+23. 54337*Q) *Q)*R) /(1.0+(-7 . 337743+ 

&(14. 97266-6 .01608S*Q) *Q) *Q> 

Note that nested multiplication is used in this expression as well as 
in the Hastings approximations in order to avoid exponentiation which -is an 
expensive computation. 

COMPARISON OF APPhQ VQIATiOITJ 

The three sets of approximations were compared on two sets of criteria: 
accuracy and computational efficiency. 

Accuracy 

The three sets of approximations were used to calculate the values of the 
z— s cores corresponding to each percentile from 1 to 99. These 2— scores were 
then compared to tabled values (3) for mean and maximum deviation on that range. 

TABLE 1 

MEAN AND MAXIMUM ERROR ON RANGE 0.01^ P^ 0.99 FOR VARIOUS APPROXIMATIONS 



Approximation j 


Mean Error 
.01 to .99 


Maximum Error 
.01 to .99 


Burr 6 






0.00825 


0.02206 


Burr 7 






0.00117 


0.00356 


Hastings 67 






0.00185 


0.00279 


Hastings 68 




i 


0.00028 


0.00044 


Byarc-Roccoe 




t 

l 

1 


0,60004 


0.00010 



The above table 3 hows marked superiority for the Byars-Roscoe approximation 
on the 0.01^ Pi 0.99. In the region 0.001 i P' : 0.009 and 0.99l£P<: 0.999 , the 
Hastings approximations maintained their approximate mean and maximum error 
characteristics. The Burr approximations were less accurate in the tails than 
over the rest of the range, but the Burr 7 approximation was more accurate 
than either the Burr 6 or the Byars-Roscoe both of which had errors in the 
tenths place at 0.001 and 0.999. 
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C omputational efficiency 

The three sets of approximations were compared for computational 
efficiency by means of timing a large number of executions of the required 
FORTRAN statements. The approximations were programmed as shown above. The 
timing was supplied by an IBM supplied subroutine called INTIME. By calling 
this subroutine before and after the completion of a DO LOOP which contained 
the given approximation, it was possible to time to within one one-hundredth of 
a second the length of time that the DO LOOP executed. Each of the five 
approximations was placed within a DO LOOP which executed 1000 times and 
again within a loop which executed 2000 times. For comparison purposes a 
DO LOOP with no interior statements was also executed 1000 and 2000 times. 

These loops were executed using both the "fast core*' and ’’slow core” options 
available on the IBM 360/50 at the Kansas State University Computing Center. 



TABLE 2 

TIME IN SECONDS FOR EXECUTING VARIOUS APPROXIMATIONS 



Approximation 


1000 Slow Core 


2000 Slow Core 


1000 Fast Core 


2000 Fast Core 


Empty DO LOOP 

1 


0.10 


0.23 


0.03 


0.05 


Burr 6 


5.04 


9.91 


1.87 


3.88 


Burr 7 


9.49 


18.68 


3.65 


8.03 


Hastings 67 


2.11 


4.78 


0.93 


1.96 


Hastings 68 


2.26 


4.64 


0.97 


2.06 


Byars-Roscoe 


0.47 


1.01 


0.27 


0.53 



As can be noted in Table 2, the Byars-Roscoe approximation is faster than 
the Burr approximations by an order of magnitude and approximately four times as 
fast as the Hastings approximations. It can further be noted that the Burr 7 
approximation takes about twice as long as does the Burr 6 formula. This 
indicates that the primary computational cost is the additional raising of a 

real number to a real power. It would further seem that the primary cost in 
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the Hastings formulae is in the preliminary steps since the addition of extra 
terms increases the cost only slightly. 

CONCLUSIONS 

On the basis of both accuracy and computational efficiency, the Byars- 
Roscoe approximation is markedly superior to the approximations provided by 
Burr and by Hastings on the range from 0.01*_ Pi, 0.99. In all cases in which 
the scores of interest fall within the given range, that approximation should 
be used. 

Within che range 0.001 ^ P L 0.009 and 0.991c P 4 0.999 , the Hastings 68 
formula is superior in accuracy and should be used if a substantial portion of 
the scores of interest fall within this range and must be accurately transformed. 
The computational efficiency of the Hastings 68 is sufficiently close to that 
of the Hastings 67 that the more accurate approximation should always be used. 
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